Dissertation Memo
1 Introduction
1.1 Purpose
Since defending my prospectus, I’ve developed, deployed, and refined a raft of models that carry out my express research agenda. The object of this memo is to describe these models and both illustrate and comment on my central findings.
1.2 Research Questions
Before proceeding, it may be useful to recall the contents of my research agenda. To this end, below are concise formulations of the questions animating each chapter (in order):
- Do human rights conditions in preferential trade agreements (PTAs), particularly when codified as more legalized (i.e., “harder”), improve human rights respect among signatories?
- Do bilateral investment treaties (BITs) improve human rights respect among signatories?
- Do economic sanctions sent by Western states improve human rights respect among non-Western targets?
1.3 Key Findings
As we shall see, my models (as presently construed) suggest the following general answers to my research questions:
- “Harder” human rights conditions in PTAs are associated with improved human rights respect in less-democratic states but a worsening of such respect in more-democratic ones.
- Likewise, BITs are associated with improved human rights respect in less-democratic states but a worsening of such respect in more-democratic ones.
- Conversely, economic sanctions against non-Western states are associated with worsened human rights respect in less-democratic targets but an improvement of such respect in more-democratic ones.
1.4 Overview of Memo
The remainder of my memo is organized as follows:
2 The General Methodology
2.1 Summary
The following section delineates the overarching methodology common to my work in all three chapters:
- Replication (Section 2.2)
- Preprocessing (Section 2.3)
- Data wrangling (Section 2.3.1)
- Multiple imputation (Section 2.3.2)
- Spatial lagging (Section 2.3.3)
- Start-Year Specification (Section 2.3.4)
- Temporal lagging (Section 2.3.5)
- Double/debiased machine learning (Section 2.4)
- Pooling (Section 2.4)
2.2 Replication
In each chapter, I first run models that approximately replicate those of the literature inspiring my work, the main points of departure (where appropriate) being (1.) the substitution of HR Scores for the authors’ outcomes and my selected treatments for theirs, and (2.) the use of both unimputed (in line with the authors’ methodologies) and multiply-imputed data. I do so to test whether the introduction of new key variables and/or multiple imputation immediately yields divergent results. More on these replications can be found in Section 3.3, Section 4.3, and Section 5.3.
2.3 Preprocessing
2.3.1 Data Wrangling
2.3.1.1 Treatments
Before running the replication and novel models, I assemble a dataset, \(\mathbf{S} = [\mathbf{Y \; D \; X}] \in \mathbb{R}^{N \times (1 + w + k)}\), through data wrangling. The first (and perhaps most involved) step in this process is the construction of the treatment variables, \(\mathbf{D} \in \mathbb{R}^{N \times w}\), a task which generally demands significant transformations to the raw data to ultimately render them in country-year (i.e., panel) format. How these treatments are sourced and generated—and how missingness is handled—is discussed further in Section 3.1, Section 4.1, and Section 5.1.
2.3.1.2 Covariates
Afterwards, I consolidate the outcome, \(\mathbf{Y} \in \mathbb{R}^{N \times 1}\), and covariates, \(\mathbf{X} \in \mathbb{R}^{N \times k}\). I also create additional covariates where necessary. \(\mathbf{X}\) is nearly identical across the three chapters, featuring the following common covariates:
- All 26 V-Dem high-level and mid-level indices.
- From the Polity Project, Polity Score and Regime Durability.1
- World Development Indicators (World Bank) pertaining to:2
- From Fariss et al. (2022), latent estimates of GDP, GDPpc, and population (logged).
- Balance of payments.5
1 The proximate sources of these variables are V-Dem and the Quality of Governance (QoG) dataset, respectively.
2 These variables are sourced from the QoG.
3 Sum of exports and imports of goods and services, % of country-year GDP.
4 Net inflows and outflows, % of country-year GDP.
5 % of country-year GDP.
Justifications for the inclusion of these covariates, as well as exceptions to the common elements of \(\mathbf{S}\), are present in Section 3.2, Section 4.2, and Section 5.2.
2.3.1.3 Case Selection & Coding
During the assembly of \(\mathbf{S}\), I make several decisions on handling observations featuring missingness in numerous or critical areas and irregular coding attributes. Most of these cases involve (1.) small-nation status, (2.) partial international recognition, and (3.) post-Cold War transitions.
In the first instance, I omit from \(\mathbf{S}\) all countries failing to appear in V-Dem, coverage in HR Scores notwithstanding. Constituting the lion’s share of these cases are microstates, as classified by the Correlates of War (COW) project. Though this omission comes at the cost of circumscribing my studies’ external validity—namely, to non-microstates—it likely improves the robustness of my final inferences vis-à-vis future replications. Indeed, V-Dem indices constitute the majority of dimensions in each chapter’s \(\mathbf{X}\), meaning conclusions drawn from imputed V-Dem data for microstates would be generally and greatly sensitive to the randomness immanent in the imputation process.6
6 More on my missingness handling method—multiple imputation—may be found in Section 2.3.2.
7 A discussion of my spatial models is to be found in Section 2.3.3.
8 For a discussion of Palestine’s exclusion, see the list’s attendant case description.
9 Only states with accompanying Gleditsch and Ward IDs (i.e., “state numbers”) appear in Fariss et al.’s (2022) data.
10 The computation of these treatments is outlined in Section 3.1 and Section 4.1.
Moreover, I omit Palestine in view of problems originating from inconsistent coding patterns. Unlike the foregoing microstates, Palestine is covered in V-Dem. However, its reported scores are disaggregated for much of the post-World War II period, with Gaza and the West Bank treated as mutually-exclusive observations, making it unclear as to how to reconcile these data with the single spatial unit that the source of my spatial data, Natural Earth, supplies for Palestine.7 Equally important, Palestine—absent from Gleditsch & Ward’s List of Independent States (v7) on the basis of its contested international recognition8—lacks the prerequisite for coverage in Fariss et al.’s (2022) GDP, GDPpc, and population variables,9 estimates essential for computing several treatments pre-imputation.10 Owing to these theoretical and logistical roadblocks, I abstain from incorporating Palestine into my models, with the understanding that doing so comes at some inferential cost.
Included in some of my models, on the other hand, are a few extinct countries, namely East Germany (formally the German Democratic Republic) and South Yemen (formally the People’s Democratic Republic of Yemen). Each was a communist state that unified with its non-communist counterpart (West Germany and North Yemen, respectively) in 1990, amidst the denouement of the Cold War. The two countries enjoy complete data for HR Scores, Fariss et al.’s (2022) estimates, and virtually all V-Dem covariates, inter alia—yet they crucially lack shapefiles from Natural Earth, which does not offer data on historical geographical boundaries on which to base spatial lags. In virtue of data availability, I opt to integrate East Germany and South Yemen into my non-spatial models, though this inevitably entails a greater number of cases in such models relative to that of my spatial ones. Always marginal (\(\Delta = 2\)), this difference is only present in models with a pre-1990 start year,11 before the two countries naturally “drop out” of the dataset.
11 Start years are discussed in greater detail in Section 2.3.4.
A further consideration involving post-Cold War geopolitical reconfigurations is the existence of several discrepancies in case identification—namely, differences across datasets in the coding of continuities between predecessor and successor states. These present challenges for variable collation and subsequent inference; indeed, when joining datasets on country identifiers (e.g., COW codes), the discrepancies may yield unmerited duplicates or missingness, raising doubts over inferential validity in later modelling stages. To harmonize all inputs of \(\mathbf{S}\), I allow decisions adopted by HR Scores and/or V-Dem—the sources of my outcome and the preponderance of covariates—to override alternative coding schemes in the following cases:
- Czechia: coded as the successor of Czechoslovakia.
- Germany: coded as the successor of West Germany.
- Serbia: coded as the successor of Yugoslavia.
- Yemen: coded as the successor of North Yemen.
These decisions being taken, I arrive at my final set of cases. In general, the maximum number of countries featuring in the two-way fixed effects models is 176, whereas the equivalent for the spatial models is 174. For a full breakdown on these figures, as well as complete itemizations of the excluded and manually-coded cases discussed above, see appendices A (Section 7.1), B (Section 7.2), and C (Section 7.3).
2.3.2 Multiple Imputation
The aim of my next step is to handle remaining missingness in \(\mathbf{S}\). Of course, simple methods of doing so exist (e.g., listwise deletion, mean imputation, etc.); but these generally rest on the very strong assumption that the data is “missing completely at random” (MCAR), produce unwarrantedly large or small standard errors, or may prove wasteful.12 On account of these pitfalls, I opt for the more robust method of multiple imputation. Widely regarded as “the best general method to deal with incomplete data” (Van Buuren, 2018), multiple imputation broadly involves the following three steps:
12 Indeed, listwise deletion would result in the excision of about a third of all my observations per model, as suggested by the data presented in Appendix D (Figure 7). For more on “ad-hoc” solutions to handling missing data, see Van Buuren (2018).
- From an incomplete dataset, generating \(m > 1\) complete versions thereof.
- With each version, running select models to estimate some parameter(s) of interest.
- Under “Rubin’s rules,” pooling these estimates to arrive at a single estimate and accompanying variance estimate, thus allowing for causal inference.13
13 For more on the basics of multiple imputation, see Van Buuren (2018).
14 For more on MCAR, MAR, and MNAR, see Van Buuren (2018).
Through this process, multiple imputation provides more realistic standard errors and eschews wasting data. It also need not assume MCAR. Specifically, multiple imputation is equipped to handle data that are both “missing at random” (MAR) and “missing not at random” (MNAR) (Van Buuren, 2018). MAR prevails when MCAR may be assumed conditional on knowing what factor(s) explain the missingness at issue, and thus rests on safer footing than MCAR alone. MNAR, meanwhile, exists when such explanatory factors are unknown.14 Carrying out multiple imputation on MNAR data generally requires researchers to assemble more variables explaining the missingness, so as to establish MAR, and/or to conduct an array of sensitivity analyses (Van Buuren, 2018). Lacking obvious evidence of MNAR (particularly after removing the cases described in Section 2.3.1), and guided by the advice that “[t]he MAR assumption is often a suitable starting point” (Van Buuren, 2018), I proceed naively presupposing MAR conditions.
In all three chapters, I select \(m = 5\). Von Hippel (2009) and White et al. (2011) jointly suggest the following “rule of thumb” when setting \(m\): “the number of imputations should be similar to the percentage of cases that are incomplete” (Van Buuren, 2018). For each chapter and start year, the proportion of observations featuring a missing value is about 30%, suggesting an \(m\) in the vicinity of 30. Undercutting this rule of thumb, however, are a few practical difficulties, as Van Buuren observes. First, the recommended value of \(m\) is likely to be large for datasets with large numbers of variables, and indeed can only increase as dimensions are added, since observations are likelier to feature at least one missing value the more variables they possess. Second—and relatedly—increasing \(m\) correspondingly increases the computational and storage costs of the analysis, as each model must be fit on each \(\mathbf{m}_{i}\). Van Buuren (2018) instead recommends setting \(m\) to the average missing data rate (i.e., the proportion of values missing from the dataset), or to 5 when computational and storage capabilities are more limited. In fact, the average missing data rate for each chapter and start year is approximately 2%—even when excluding variables that are complete by design, namely the outcome, treatments, and fixed effects—meaning \(m = 5\) is more than sufficient under the circumstances. For a complete summary of these missing data rates, see Appendix D (Section 7.4).
To create the imputed data, I avail myself of the mice R package—specifically the futuremice function, which permits users to generate each \(\mathbf{m}_{i}\) in parallel across individual CPU cores, significantly quickening computation time. In addition to setting \(m\),15 I further specify the imputation method—random forests, equipped to handle both numeric and categorical data (Van Buuren, 2018)—and the predictor variables.16 Finally, I set a seed to ensure reproducibility of the imputations. Implementing the function with these arguments ultimately yields five completed iterations of \(\mathbf{S}\): \((\mathbf{m}_{i})_{i=1}^5\). It is from these iterations that I conduct all remaining work, pooling the results obtained from each \(\mathbf{m}_{i}\) in the final stage.17
15 Technically, this is done by tolerating the default setting \(m = 5\).
16 For more on the process of and algorithm for tree-based imputation, see (Van Buuren, 2018)
17 For more on pooling, see Section 2.5.
2.3.3 Spatial Lagging
In each chapter, I run two sets of models: one predicated on traditional two-way fixed effects, the other on spatial modeling. My preliminary plans for spatial modeling in chapters 2 and 3 are discussed in my prospectus; but at this juncture, it’s imperative to elaborate not only on why I’ve instead implemented spatial models for all three chapters, but also on the specific spatial model I’ve chosen.
The cardinal aim of spatial modeling is to control for spatial dependencies between observations in geographically-structured data, where traits of unit \(x\) may correlate with or affect those of neighboring unit \(y\) and perhaps vice versa. In such contexts, these dependencies may jeopardize valid causal inference in undermining the validity of independent and identical distribution (IID), an assumption undergirding linear regression models (Rüttenauer, 2022, p. 729). The data at my disposal—country-year panel data—manifestly feature a geographical dynamic, with some observations being spatially nearer to or farther away from others. In addition, spatial dependencies, irrespective of the mechanism(s) producing them,18 have been observed in country-year data in sundry dimensions. These include poverty, inequality, corruption, pollution, levels of democracy and wealth, and policy emulation more generally (Ward & Gleditsch, 2008, pp. 2-11).19 The ubiquity of spatial dependencies in country-year data and the hazard they pose to valid inference jointly merit, in my view, the application of spatial models wherever such data is of interest—that is, throughout my dissertation—at least as a supplement to traditional, non-spatial methods.
18 These may include spillovers or clustering (whether explained or unexplained) of outcomes, covariates, or unobservables (i.e., the error term). For more on these potential mechanisms, see Cook et al. (2023, p. 62).
19 For additional examples of spatial dependencies in country-year data, see Cook et al. (2023, p. 62) and Wimpy et al. (2021, p. 723).
Of these models, the spatially-lagged X (SLX) model is what I select over other candidate models. I do so because recent literature on spatial modeling, including Rüttenauer (2024; 2022) and Wimpy et al. (2021), has begun to recommend SLX over its counterparts—particularly the spatial autoregressive (SAR) model, which Rüttenauer notes is “by far the most prominent spatial specification” (p. 7).
The SAR model, which incorporates an endogenous, spatially-lagged form of the outcome (\(y\)) as a covariate, gives way by design to system-wide “global effects”: effects beyond first-order (i.e., nearest) neighbors, where an original unit’s \(y\) affects not only its first-order neighbor’s \(y\), but also the neighbor of the first-order neighbor’s \(y\) via the latter (and so on); and feedback effects, where changes in \(y\) in both high- and first-order neighbors return to affect the original unit’s \(y\). The upshot of these global effects is that they complicate inference. On the one hand, they preclude interpretation of coefficients as ordinary marginal/partial effects, or how a one-unit change in \(x\) impacts \(y\), ceteris paribus: both direct (\(x_{i} \rightarrow y_{i}\)) and indirect (\(x_{i} \rightarrow y_{j}\)) effects differ in value from the coefficients, a function of the feedback-loop dynamic,20 meaning said coefficients cannot be interpreted as simple \(x \rightarrow y\) relationships in either sense. On the other, the assumption of global dependence is not invariably appropriate: some spatial processes may not extend to the entire system (e.g., the international level, which regional clustering of democracy, wealth, and other variables generally suggests),21 or may take time to become fully realized, despite that SAR diffuses global effects exactly at time \(t\).22
20 Namely, as a result of the spatial multiplier matrix.
21 Indeed, Wimpy et al. find that global dependencies are “relatively rare in political science” (2021, p. 737).
22 For more on the implications of SAR’s construction, including mathematical proofs thereof, see Rüttenauer (2024, pp. 11-14; 2022, pp. 731-736), and Wimpy et al. (2021, pp. 723-724).
By contrast, the SLX model is highly flexible and easy to interpret. It begins with the more conservative assumption of local dependence (i.e., relationships between first-order neighbors), though researchers may still account for more global (i.e., higher-order) dependencies should they desire to do so. Also, by eschewing global dependence as a starting point, SLX is better suited for panel data, where spatial spillovers at time \(t\) may be confined to first- or low-order neighbors. Global effects being absent by default, SLX models need not entail feedback loops, enabling coefficients to be interpreted as ordinary, OLS-like marginal effects.23 Equally beneficial, SLX is fit for a wide range of inferential techniques, including machine-learning methods, since it merely involves introducing the spatial lags as an additional set of covariates. Finally, Rüttenauer (2022) and Wimpy et al. (2021) find that SLX is relatively robust to misspecifications of the spatially-dependent covariates.24 In view of these advantages, the authors posit that SLX is an apt starting point for model building, particularly absent strong theory implying that a different spatial model would be more appropriate.25
23 This point is stressed by Rüttenauer (2022, p. 735).
24 For more on their findings, see Rüttenauer (2022, p. 749) and Wimpy et al. (2021, p. 737).
25 The advantages of SLX are elegantly summarized in Rüttenauer (2024, p. 23) and Wimpy et al. (2021, p. 725).
26 Formula taken from Wimpy et al. (2021, p. 724).
SLX is of the basic form:26
\[\mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \mathbf{WZ}\boldsymbol{\theta} + \epsilon\]
Above, \(\mathbf{Y}\) is the outcome, \(\epsilon\) the error term, \(\mathbf{X}\) the matrix of independent variables, \(\boldsymbol{\beta}\) the corresponding coefficients, \(\mathbf{W}\) the matrix of first-order neighbor weights, \(\mathbf{Z}\) the matrix of covariates (or confounders) expected to have a spatial influence, and \(\boldsymbol{\theta}\) the spatial coefficients,27 capturing spillover effects from \(\mathbf{Z}\) covariates on the outcome.28
27 Wimpy et al. label \(\mathbf{X}\) and \(\mathbf{Z}\) differently to convey that their contents may vary (2021, p. 724). For example, \(\mathbf{X}\) may contain both the treatment(s) and covariates, whereas \(\mathbf{Z}\) may contain the latter, exclusively.
28 More on interpreting the SLX formula may be found in Rüttenauer (2024, p. 8).
29 That is, a two-dimensional matrix wherein each row sums to 1.
30 This matrix depiction is taken from Rüttenauer (2024, p. 3).
\(\mathbf{W}\), an \(N \times N\) row-normalized matrix,29 is of especial import, containing the base information needed to compute the “spatial lags”—versions of each covariate from \(\mathbf{Z} \in \mathbb{R}^{N \times v}\) that take the mean of any given observation’s neighbors, and the inclusion of which ultimately makes for an SLX model. Together, \(\mathbf{WZ}\) denote the spatial lags. \(\mathbf{W}\) is of the form:30
\[\mathbf{W} = \begin{bmatrix} w_{1,1} & w_{1,2} & w_{1,3} & \ldots & w_{1, n} \\ w_{2,1} & w_{2,2} & w_{2,3} & \ldots & w_{2, n} \\ w_{3,1} & w_{3,2} & w_{3,3} & \ldots & w_{3, n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ w_{n,1} & w_{n,2} & w_{n,3} & \ldots & w_{n, n} \end{bmatrix}\]
Here, each \(w_{i,j}\) element of \(\mathbf{W}\) gives the proportion of any given unit’s neighbors one must consider in computing the spatial lags. As a simple yet relevant example, consider a country-year dataset comprising three countries—\(i\), \(j\), and \(k\)—and a single covariate, such that \(\mathbf{Z} = z_{v}\). For a select year \(t\), let \(i\) and \(j\) be neighbors, and let \(j\) and \(k\) be neighbors. The row-normalized neighbor-weight matrix would thus be:
\[\mathbf{W} = \left[ \begin{array}{c|ccccc} & i & j & k \\ \hline i & 0 & 1 & 0 \\ j & 0.5 & 0 & 0.5 \\ k & 0 & 1 & 0 \end{array} \right]\]
Furthermore, let \(z_{v} = (i = 2, \; j = 4, \; k = 7.5)\). The spatial lags at \(t\) for countries \(i\), \(j\), and \(k\) on covariate \(z_{v}\) are thus found by:
\[\mathbf{W}z_{v}^{\; T} = \begin{bmatrix} 0 & 1 & 0 \\ 0.5 & 0 & 0.5 \\ 0 & 1 & 0 \end{bmatrix} \begin{bmatrix} 2 \\ 4 \\ 7.5 \end{bmatrix} = \begin{bmatrix} 4 \\ 4.75 \\ 4 \end{bmatrix}\]
As aforementioned, researchers leveraging SLX may opt to model spillovers beyond first-order relationships. In this scenario, one need only derive higher-order forms of \(\mathbf{W}\) (e.g., \(\mathbf{W}^{2}\) for second-order neighbors) and compute the additional spatial lags, allowing for the expanded SLX model:
\[\mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \mathbf{WZ}\boldsymbol{\theta} + \mathbf{W}^{2}\mathbf{Z}\boldsymbol{\theta}_{2} + ... + \mathbf{W}^{n}\mathbf{Z}\boldsymbol{\theta}_{n} + \epsilon\]
Though these higher-order lags are easy to generate, they may entail high computation and storage costs at the modeling stage. Specifically, introducing more spatial lags increases the number of dimensions for each dataset, hampering fit speed and enlarging file sizes, whilst suggesting the implementation of individual models for each maximum order, \(n\). Moreover, as regards my research, scant evidence or theory exists on which to justifiably set \(n\) beyond \(n = 1\). Altogether, these considerations motivate me to limit my spatial lags to the first order, though doing so doesn’t foreclose the possibility of implementing higher-order lags as robustness checks.
To compute the spatial lags, \(\mathbf{WZ}\), I first obtain the spatial data needed to derive the weight matrix, \(\mathbf{W}\). This data is sourced from Natural Earth, a public domain map dataset containing shapefiles for all of the world’s countries and land objects, and accessed through the R package rnaturalearth.31 From these shapefiles, and for every year \(t\), I use the st_distance function from the R package sf to generate an \(N \times N\) matrix, wherein each element gives the minimum great-circle distance separating the borders of any two countries. The resultant matrix is structured such that—for countries \(i\) and \(j\) separated by a minimum distance \(d\):32
31 These data are available through the ne_countries function, specifically.
32 By default, st_distance gives \(d\) in meters.
- \(d_{i,i} = 0\) and \(d_{j,j} = 0\),
- \(d_{i,j} = 0\) and \(d_{j,i} = 0\) when \(i\) and \(j\) share contiguous borders, and
- \(d_{i,j} > 0\) and \(d_{j,i} > 0\) when \(i\) and \(j\) do not share contiguous borders.
Subsequently, I specify neighbors for each country-distance matrix under a binary neighborhood-membership criterion.33 Of course, one could define neighbors as a country-pair sharing contiguous borders under a “rook” rule; but doing so is to presume that many countries lacking shared borders (e.g., island states) do not belong to any geographical “neighborhood” and are thus immune to spatial processes. This assumption seems unrealistic, so I adopt the distance-threshold rule of 200 kilometers suggested by Ward & Gleditsch (2007, p. 12) to decide neighborhood membership. Under this criterion, with respect to country \(i\), country \(j\) qualifies for membership in \(i\)’s neighborhood if \(d_{i,j} \leq 200 \text{km}\). Every neighbor relationship being specified, I derive \(\mathbf{W}\) and compute the spatial lags, using the spdep R package functions nb2listw and lag.listw, respectively, to do so. For each chapter, \(\mathbf{Z}\) is comprised of the common covariates enumerated in Section 2.3.1, as well as any chapter-specific covariates.
33 Put differently, under the criterion, country \(i\) is classified with respect to country \(j\) as either a neighbor (1) or a non-neighbor (0). Ergo, no “partial” neighbor exists.
34 For example, for country \(i\) with neighbors \(j\), \(k\), and \(l\), and with respect to variable \(z\), let \(z_{j}\) be missing but \(z_{k}\) and \(z_{l}\) be non-missing. Using exclusively known data, \(\bar{z} = 0.5(z_{k} + z_{l})\). Should \(z_{j}\) be imputed, however, this value may be introduced to the equation, in which case \(\bar{z} = 0.33(z_{j} + z_{k} + z_{l})\). The second \(\bar{z}\) is likely to resemble the first, provided that \(z_{j}\) is not an outlier, since the underlying equations differ only by the presence of \(z_{j}\) (as well as corresponding adjustments to the averaging weight).
In calculating \(\mathbf{WZ}\), lag.listw produces NA values for lags where at least one neighbor’s information is missing. These values can be handled in one of two ways: by imputing them, or by first imputing missingness in \(\mathbf{Z}\)—thus forming \((\mathbf{m}_{i})_{i=1}^5\)—and subsequently computing \(\mathbf{WZ}\) for each \(\mathbf{m}_{i}\). The first option, relying solely on the multiple-imputation algorithm, generates values without enforcing the spatial constraints imposed by \(\mathbf{WZ}\), where each element must be an average of the values of country \(i\)’s neighbors. Consequently, the imputed figures are unlikely to approximate the mean of country \(i\)’s non-missing neighbors—a reasonable starting point for estimating the true value of the lags—and they may vary significantly by \(\mathbf{m}_{i}\). The second option, by contrast, respects \(\mathbf{WZ}\)’s spatial constraints, computing the lags more-or-less normally—namely, by directly considering known neighbor values, supplementing these inputs with imputations where necessary. As a result, these estimates hew closer to the mean of country \(i\)’s non-missing neighbors and are therefore more realistic.34 To wit, the second method yields more consistent, accurate, and theoretically-valid estimates of the lags across each \(\mathbf{m}_{i}\). It is clearly preferable, so for each dataset \(\mathbf{S}\), I proceed by finding \(\mathbf{WZ}\) from imputed datasets \((\mathbf{m}_{i})_{i=1}^5\). These lags are then appended to each \(\mathbf{m}_{i}\), making five expanded imputed datasets, \((\mathbf{p}_{i})_{i=1}^5\), where \(\mathbf{p}_{i} = [\mathbf{Y \; D \; X \; WZ}] \in \mathbb{R}^{N \times (1 + w + k + v)}\)
2.3.4 Start-Year Specification
Upon completing the spatial lags, I specify the “start years”—the year at which at each \(\mathbf{p}_{i}\) may begin—where necessary. The exact justifications for these start years may be found in Section 3, Section 4, and Section 5; but at this juncture, a few clarifying points merit mentioning. First, these start years serve one of two purposes: to align with the start years adopted by the authors inspiring my work, or to test alternative theories in which another start year may plausibly yield different results. Second, the maximum number of start years set for any given chapter is two: although data availability generally allowed for a third start year beginning before those used by the authors, introducing it in preliminary model testing revealed prohibitive computational costs.35 Third, Chapter 3 specifies 1990—exclusively—as the start year, since setting start years more recent than this entails inferential risks.36 Ultimately, in specifying start years, there now exist iterations \(\mathbf{r}_{i,j}\) for each \(\mathbf{p}_{i}\), where \(j\) signifies the index number of the start years. In chapters 1 and 2, two start years are specified, such that \(j \in \{1, 2\}\). By contrast, in Chapter 3, there is no variation in start years, so \(j = 1\).
35 To be precise, running a task with three start years usually led to memory limits being reached, after which the task would fail.
36 Indeed, because the universe of cases in this chapter is already limited to “Global South” countries, and in consideration of subsequent temporal lagging, a later start year would very likely give way to violations of Rubin’s pooling rules.
2.3.5 Temporal Lagging
The final preprocessing step involves temporally lagging all independent variables—the treatments (\(\mathbf{D}\)), covariates (\(\mathbf{X}\)), and spatial lags (\(\mathbf{WZ}\))—for each \(\mathbf{r}_{i,j}\) with respect to the outcome (\(\mathbf{Y}\)).